{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "1fe6e643-9453-4381-9445-bd471685fb96",
   "metadata": {},
   "source": [
    "# Labeling the [math](https://huggingface.co/datasets/math_dataset) dataset using Autolabel\n",
    "\n",
    "This is a multi-class classification task where the input are high school math questions we have to correctly classify the question into one of 6 categories. "
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "aacac4ae-c7f9-4dee-be3a-a2a6bfa099fa",
   "metadata": {},
   "source": [
    "## Install Autolabel\n",
    "Plus, setup your OpenAI API key, since we'll be using `gpt-3.5-turbo` as our LLM for labeling."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3dc19059-2f63-44b7-9a32-8b38a11249aa",
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install 'refuel-autolabel[openai]'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "fbdeca2f-dd20-4634-b3f8-1b3ed45c4705",
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "\n",
    "# provide your own OpenAI API key here\n",
    "os.environ[\"OPENAI_API_KEY\"] = \"sk-\"\n"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "8eb09918-494a-42ef-86a7-df93b3ce4284",
   "metadata": {
    "tags": []
   },
   "source": [
    "## Download the dataset\n",
    "\n",
    "This dataset is available to install via Autolabel."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "b4c116ce-3294-45c2-9158-eac929f03a95",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Downloading example dataset from https://autolabel-benchmarking.s3.us-west-2.amazonaws.com/math/seed.csv to seed.csv...\n",
      "Downloading example dataset from https://autolabel-benchmarking.s3.us-west-2.amazonaws.com/math/test.csv to test.csv...\n",
      "100% [........................................] [159177/159177] bytes\r"
     ]
    }
   ],
   "source": [
    "from autolabel import get_data\n",
    "\n",
    "get_data(\"math\")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "5f038fcc-9ec0-4f2d-b84d-2e963656c6bd",
   "metadata": {},
   "source": [
    "This downloads two datasets:\n",
    "* `test.csv`: This is the larger dataset we are trying to label using LLMs\n",
    "* `seed.csv`: This is a small dataset where we already have human-provided labels"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "84b014d1-f45c-4479-9acc-0d20870b1786",
   "metadata": {},
   "source": [
    "## Start the labeling process!\n",
    "\n",
    "Labeling with Autolabel is a 3-step process:\n",
    "* First, we specify a labeling configuration (see `config.json` below)\n",
    "* Next, we do a dry-run on our dataset using the LLM specified in `config.json` by running `agent.plan`\n",
    "* Finally, we run the labeling with `agent.run`"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "33d47b67-0718-4289-bb59-989d851d09ed",
   "metadata": {},
   "source": [
    "### First labeling run"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "c093fe91-3508-4140-8bd6-217034e3cce6",
   "metadata": {},
   "outputs": [],
   "source": [
    "import json\n",
    "\n",
    "from autolabel import LabelingAgent"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "c93fae0b",
   "metadata": {},
   "outputs": [],
   "source": [
    "# load the config\n",
    "with open(\"config_math.json\") as f:\n",
    "     config = json.load(f)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "1fad4e85-d598-413d-9b01-9690653a05ad",
   "metadata": {},
   "source": [
    "Let's review the configuration file below. You'll notice the following useful keys:\n",
    "* `task_type`: `classification` (since it's a classification task)\n",
    "* `model`: `{'provider': 'openai', 'name': 'gpt-3.5-turbo'}` (use a specific OpenAI model)\n",
    "* `prompt.task_guidelines`: `'You are an expert at understanding bank customers support complaints and queries...` (how we describe the task to the LLM)\n",
    "* `prompt.labels`: `['age_limit', 'apple_pay_or_google_pay', 'atm_support', ...]` (the full list of labels to choose from)\n",
    "* `prompt.few_shot_num`: 10 (how many labeled examples to provide to the LLM)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "6a3610fd-721e-44de-9b2c-2cd73ec86bba",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'task_name': 'MathQuestionsClassification',\n",
       " 'task_type': 'classification',\n",
       " 'dataset': {'label_column': 'label', 'delimiter': ','},\n",
       " 'model': {'provider': 'openai', 'name': 'gpt-3.5-turbo'},\n",
       " 'prompt': {'task_guidelines': \"You are an expert at understanding math questions. You are supposed to identify the category of a high-school level math question. There are five possible categories (1) algebra (2) arithmetic (3) measurement (4) numbers, and (5) probability. Use the following guidelines: (1) 'algebra' questions will typically contain letter variables and will ask you to find the value of a variable (2) 'arithmetic' questions will ask the sum, difference, multiplication, division, power, square root or value of expressions involving brackets (3) 'measurement' questions are questions that ask to convert a quantity from some unit to some other unit (4) 'numbers' questions will be about bases, remainders, divisors, GCD, LCM etc. (5) 'probability' questions will ask about the probability of the occurrence of something. Make sure you output only one of the following categories: {labels}\",\n",
       "  'output_guidelines': 'You will answer with just the the correct output label and nothing else.',\n",
       "  'labels': ['algebra', 'arithmetic', 'measurement', 'numbers', 'probability'],\n",
       "  'few_shot_examples': 'seed.csv',\n",
       "  'few_shot_selection': 'semantic_similarity',\n",
       "  'few_shot_num': 10,\n",
       "  'example_template': 'Input: {example}\\nOutput: {label}'}}"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "config"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "acb4a3de-fa84-4b94-b17a-7a6fac892a1d",
   "metadata": {},
   "outputs": [],
   "source": [
    "# create an agent for labeling\n",
    "agent = LabelingAgent(config=config)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "92667a39",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "dfdc412b610440cd82723a03a102fa19",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Output()"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"></pre>\n"
      ],
      "text/plain": []
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\">\n",
       "</pre>\n"
      ],
      "text/plain": [
       "\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\">┌──────────────────────────┬─────────┐\n",
       "│<span style=\"color: #800080; text-decoration-color: #800080; font-weight: bold\"> Total Estimated Cost     </span>│<span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\"> $5.8169 </span>│\n",
       "│<span style=\"color: #800080; text-decoration-color: #800080; font-weight: bold\"> Number of Examples       </span>│<span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\"> 2000    </span>│\n",
       "│<span style=\"color: #800080; text-decoration-color: #800080; font-weight: bold\"> Average cost per example </span>│<span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\"> $0.0029 </span>│\n",
       "└──────────────────────────┴─────────┘\n",
       "</pre>\n"
      ],
      "text/plain": [
       "┌──────────────────────────┬─────────┐\n",
       "│\u001b[1;35m \u001b[0m\u001b[1;35mTotal Estimated Cost    \u001b[0m\u001b[1;35m \u001b[0m│\u001b[1;32m \u001b[0m\u001b[1;32m$5.8169\u001b[0m\u001b[1;32m \u001b[0m│\n",
       "│\u001b[1;35m \u001b[0m\u001b[1;35mNumber of Examples      \u001b[0m\u001b[1;35m \u001b[0m│\u001b[1;32m \u001b[0m\u001b[1;32m2000   \u001b[0m\u001b[1;32m \u001b[0m│\n",
       "│\u001b[1;35m \u001b[0m\u001b[1;35mAverage cost per example\u001b[0m\u001b[1;35m \u001b[0m│\u001b[1;32m \u001b[0m\u001b[1;32m$0.0029\u001b[0m\u001b[1;32m \u001b[0m│\n",
       "└──────────────────────────┴─────────┘\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #00ff00; text-decoration-color: #00ff00\">───────────────────────────────────────────────── </span>Prompt Example<span style=\"color: #00ff00; text-decoration-color: #00ff00\"> ──────────────────────────────────────────────────</span>\n",
       "</pre>\n"
      ],
      "text/plain": [
       "\u001b[92m───────────────────────────────────────────────── \u001b[0mPrompt Example\u001b[92m ──────────────────────────────────────────────────\u001b[0m\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\">You are an expert at understanding math questions. You are supposed to identify the category of a high-school level\n",
       "math question. There are five possible categories <span style=\"font-weight: bold\">(</span><span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">1</span><span style=\"font-weight: bold\">)</span> algebra <span style=\"font-weight: bold\">(</span><span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">2</span><span style=\"font-weight: bold\">)</span> arithmetic <span style=\"font-weight: bold\">(</span><span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">3</span><span style=\"font-weight: bold\">)</span> measurement <span style=\"font-weight: bold\">(</span><span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">4</span><span style=\"font-weight: bold\">)</span> numbers, and <span style=\"font-weight: bold\">(</span><span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">5</span><span style=\"font-weight: bold\">)</span> \n",
       "probability. Use the following guidelines: <span style=\"font-weight: bold\">(</span><span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">1</span><span style=\"font-weight: bold\">)</span> <span style=\"color: #008000; text-decoration-color: #008000\">'algebra'</span> questions will typically contain letter variables and will\n",
       "ask you to find the value of a variable <span style=\"font-weight: bold\">(</span><span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">2</span><span style=\"font-weight: bold\">)</span> <span style=\"color: #008000; text-decoration-color: #008000\">'arithmetic'</span> questions will ask the sum, difference, multiplication, \n",
       "division, power, square root or value of expressions involving brackets <span style=\"font-weight: bold\">(</span><span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">3</span><span style=\"font-weight: bold\">)</span> <span style=\"color: #008000; text-decoration-color: #008000\">'measurement'</span> questions are questions \n",
       "that ask to convert a quantity from some unit to some other unit <span style=\"font-weight: bold\">(</span><span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">4</span><span style=\"font-weight: bold\">)</span> <span style=\"color: #008000; text-decoration-color: #008000\">'numbers'</span> questions will be about bases, \n",
       "remainders, divisors, GCD, LCM etc. <span style=\"font-weight: bold\">(</span><span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">5</span><span style=\"font-weight: bold\">)</span> <span style=\"color: #008000; text-decoration-color: #008000\">'probability'</span> questions will ask about the probability of the occurrence of\n",
       "something. Make sure you output only one of the following categories: algebra\n",
       "arithmetic\n",
       "measurement\n",
       "numbers\n",
       "probability\n",
       "\n",
       "You will answer with just the the correct output label and nothing else.\n",
       "\n",
       "Some examples with their output answers are provided below:\n",
       "\n",
       "Input: Work out <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">-5</span> * <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">-13.786</span>.\\n\n",
       "Output: arithmetic\n",
       "\n",
       "Input: What is the value of <span style=\"font-weight: bold\">(</span><span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">-8</span><span style=\"font-weight: bold\">)</span><span style=\"color: #800080; text-decoration-color: #800080\">/</span><span style=\"font-weight: bold\">(</span><span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">-10</span><span style=\"font-weight: bold\">)</span>*<span style=\"font-weight: bold\">(</span><span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">-6390</span><span style=\"font-weight: bold\">)</span><span style=\"color: #800080; text-decoration-color: #800080\">/</span><span style=\"color: #ff00ff; text-decoration-color: #ff00ff\">44304</span>?\\n\n",
       "Output: arithmetic\n",
       "\n",
       "Input: In base <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">9</span>, what is <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">-3373</span> + <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">-1417</span>?\\n\n",
       "Output: arithmetic\n",
       "\n",
       "Input: In base <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">9</span>, what is <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">-3774</span> - <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">22</span>?\\n\n",
       "Output: arithmetic\n",
       "\n",
       "Input: What is the difference between <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">-57</span> and <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">-18192.801</span>?\\n\n",
       "Output: arithmetic\n",
       "\n",
       "Input: <span style=\"font-weight: bold\">(</span><span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">-4</span> - <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">4</span><span style=\"font-weight: bold\">)</span> + <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">75</span> + <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">-50</span>\\n\n",
       "Output: arithmetic\n",
       "\n",
       "Input: What is the value of <span style=\"font-weight: bold\">(</span><span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">65</span>/<span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">260</span><span style=\"font-weight: bold\">)</span><span style=\"color: #800080; text-decoration-color: #800080\">/</span><span style=\"font-weight: bold\">(((</span><span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">-65</span><span style=\"font-weight: bold\">)</span><span style=\"color: #800080; text-decoration-color: #800080\">/</span><span style=\"color: #ff00ff; text-decoration-color: #ff00ff\">360</span><span style=\"font-weight: bold\">)</span><span style=\"color: #800080; text-decoration-color: #800080\">/</span><span style=\"font-weight: bold\">(</span><span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">-13</span><span style=\"font-weight: bold\">))</span>?\\n\n",
       "Output: arithmetic\n",
       "\n",
       "Input: In base <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">13</span>, what is <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">-89</span> - <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">2731</span>?\\n\n",
       "Output: arithmetic\n",
       "\n",
       "Input: In base <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">15</span>, what is -3962b - <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">27</span>?\\n\n",
       "Output: arithmetic\n",
       "\n",
       "Input: What is the value of <span style=\"font-weight: bold\">(</span><span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">9</span>/<span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">6</span><span style=\"font-weight: bold\">)</span><span style=\"color: #800080; text-decoration-color: #800080\">/</span><span style=\"font-weight: bold\">((</span><span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">-2247</span><span style=\"font-weight: bold\">)</span><span style=\"color: #800080; text-decoration-color: #800080\">/</span><span style=\"color: #ff00ff; text-decoration-color: #ff00ff\">1712</span><span style=\"font-weight: bold\">)</span>?\\n\n",
       "Output: arithmetic\n",
       "\n",
       "Now I want you to label the following example:\n",
       "Input: What is the product of <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">5</span> and <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">-1007531</span>?\\n\n",
       "Output: \n",
       "</pre>\n"
      ],
      "text/plain": [
       "You are an expert at understanding math questions. You are supposed to identify the category of a high-school level\n",
       "math question. There are five possible categories \u001b[1m(\u001b[0m\u001b[1;36m1\u001b[0m\u001b[1m)\u001b[0m algebra \u001b[1m(\u001b[0m\u001b[1;36m2\u001b[0m\u001b[1m)\u001b[0m arithmetic \u001b[1m(\u001b[0m\u001b[1;36m3\u001b[0m\u001b[1m)\u001b[0m measurement \u001b[1m(\u001b[0m\u001b[1;36m4\u001b[0m\u001b[1m)\u001b[0m numbers, and \u001b[1m(\u001b[0m\u001b[1;36m5\u001b[0m\u001b[1m)\u001b[0m \n",
       "probability. Use the following guidelines: \u001b[1m(\u001b[0m\u001b[1;36m1\u001b[0m\u001b[1m)\u001b[0m \u001b[32m'algebra'\u001b[0m questions will typically contain letter variables and will\n",
       "ask you to find the value of a variable \u001b[1m(\u001b[0m\u001b[1;36m2\u001b[0m\u001b[1m)\u001b[0m \u001b[32m'arithmetic'\u001b[0m questions will ask the sum, difference, multiplication, \n",
       "division, power, square root or value of expressions involving brackets \u001b[1m(\u001b[0m\u001b[1;36m3\u001b[0m\u001b[1m)\u001b[0m \u001b[32m'measurement'\u001b[0m questions are questions \n",
       "that ask to convert a quantity from some unit to some other unit \u001b[1m(\u001b[0m\u001b[1;36m4\u001b[0m\u001b[1m)\u001b[0m \u001b[32m'numbers'\u001b[0m questions will be about bases, \n",
       "remainders, divisors, GCD, LCM etc. \u001b[1m(\u001b[0m\u001b[1;36m5\u001b[0m\u001b[1m)\u001b[0m \u001b[32m'probability'\u001b[0m questions will ask about the probability of the occurrence of\n",
       "something. Make sure you output only one of the following categories: algebra\n",
       "arithmetic\n",
       "measurement\n",
       "numbers\n",
       "probability\n",
       "\n",
       "You will answer with just the the correct output label and nothing else.\n",
       "\n",
       "Some examples with their output answers are provided below:\n",
       "\n",
       "Input: Work out \u001b[1;36m-5\u001b[0m * \u001b[1;36m-13.786\u001b[0m.\\n\n",
       "Output: arithmetic\n",
       "\n",
       "Input: What is the value of \u001b[1m(\u001b[0m\u001b[1;36m-8\u001b[0m\u001b[1m)\u001b[0m\u001b[35m/\u001b[0m\u001b[1m(\u001b[0m\u001b[1;36m-10\u001b[0m\u001b[1m)\u001b[0m*\u001b[1m(\u001b[0m\u001b[1;36m-6390\u001b[0m\u001b[1m)\u001b[0m\u001b[35m/\u001b[0m\u001b[95m44304\u001b[0m?\\n\n",
       "Output: arithmetic\n",
       "\n",
       "Input: In base \u001b[1;36m9\u001b[0m, what is \u001b[1;36m-3373\u001b[0m + \u001b[1;36m-1417\u001b[0m?\\n\n",
       "Output: arithmetic\n",
       "\n",
       "Input: In base \u001b[1;36m9\u001b[0m, what is \u001b[1;36m-3774\u001b[0m - \u001b[1;36m22\u001b[0m?\\n\n",
       "Output: arithmetic\n",
       "\n",
       "Input: What is the difference between \u001b[1;36m-57\u001b[0m and \u001b[1;36m-18192.801\u001b[0m?\\n\n",
       "Output: arithmetic\n",
       "\n",
       "Input: \u001b[1m(\u001b[0m\u001b[1;36m-4\u001b[0m - \u001b[1;36m4\u001b[0m\u001b[1m)\u001b[0m + \u001b[1;36m75\u001b[0m + \u001b[1;36m-50\u001b[0m\\n\n",
       "Output: arithmetic\n",
       "\n",
       "Input: What is the value of \u001b[1m(\u001b[0m\u001b[1;36m65\u001b[0m/\u001b[1;36m260\u001b[0m\u001b[1m)\u001b[0m\u001b[35m/\u001b[0m\u001b[1m(\u001b[0m\u001b[1m(\u001b[0m\u001b[1m(\u001b[0m\u001b[1;36m-65\u001b[0m\u001b[1m)\u001b[0m\u001b[35m/\u001b[0m\u001b[95m360\u001b[0m\u001b[1m)\u001b[0m\u001b[35m/\u001b[0m\u001b[1m(\u001b[0m\u001b[1;36m-13\u001b[0m\u001b[1m)\u001b[0m\u001b[1m)\u001b[0m?\\n\n",
       "Output: arithmetic\n",
       "\n",
       "Input: In base \u001b[1;36m13\u001b[0m, what is \u001b[1;36m-89\u001b[0m - \u001b[1;36m2731\u001b[0m?\\n\n",
       "Output: arithmetic\n",
       "\n",
       "Input: In base \u001b[1;36m15\u001b[0m, what is -3962b - \u001b[1;36m27\u001b[0m?\\n\n",
       "Output: arithmetic\n",
       "\n",
       "Input: What is the value of \u001b[1m(\u001b[0m\u001b[1;36m9\u001b[0m/\u001b[1;36m6\u001b[0m\u001b[1m)\u001b[0m\u001b[35m/\u001b[0m\u001b[1m(\u001b[0m\u001b[1m(\u001b[0m\u001b[1;36m-2247\u001b[0m\u001b[1m)\u001b[0m\u001b[35m/\u001b[0m\u001b[95m1712\u001b[0m\u001b[1m)\u001b[0m?\\n\n",
       "Output: arithmetic\n",
       "\n",
       "Now I want you to label the following example:\n",
       "Input: What is the product of \u001b[1;36m5\u001b[0m and \u001b[1;36m-1007531\u001b[0m?\\n\n",
       "Output: \n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #00ff00; text-decoration-color: #00ff00\">───────────────────────────────────────────────────────────────────────────────────────────────────────────────────</span>\n",
       "</pre>\n"
      ],
      "text/plain": [
       "\u001b[92m───────────────────────────────────────────────────────────────────────────────────────────────────────────────────\u001b[0m\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# dry-run -- this tells us how much this will cost and shows an example prompt\n",
    "from autolabel import AutolabelDataset\n",
    "\n",
    "ds = AutolabelDataset(\"data/math/test.csv\", config=config)\n",
    "agent.plan(ds)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "dd703025-54d8-4349-b0d6-736d2380e966",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "8fa0f267bdd54c8d96bcb89bbad43e4c",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Output()"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"></pre>\n"
      ],
      "text/plain": []
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\">classification_report:\n",
       "              precision    recall  f1-score   support\n",
       "\n",
       "     algebra       <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">0.92</span>      <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">1.00</span>      <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">0.96</span>        <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">24</span>\n",
       "  arithmetic       <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">0.85</span>      <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">0.89</span>      <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">0.87</span>        <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">19</span>\n",
       " measurement       <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">1.00</span>      <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">1.00</span>      <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">1.00</span>        <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">19</span>\n",
       "     numbers       <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">1.00</span>      <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">0.82</span>      <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">0.90</span>        <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">17</span>\n",
       " probability       <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">1.00</span>      <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">1.00</span>      <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">1.00</span>        <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">21</span>\n",
       "\n",
       "    accuracy                           <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">0.95</span>       <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">100</span>\n",
       "   macro avg       <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">0.95</span>      <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">0.94</span>      <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">0.95</span>       <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">100</span>\n",
       "weighted avg       <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">0.95</span>      <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">0.95</span>      <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">0.95</span>       <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">100</span>\n",
       "\n",
       "</pre>\n"
      ],
      "text/plain": [
       "classification_report:\n",
       "              precision    recall  f1-score   support\n",
       "\n",
       "     algebra       \u001b[1;36m0.92\u001b[0m      \u001b[1;36m1.00\u001b[0m      \u001b[1;36m0.96\u001b[0m        \u001b[1;36m24\u001b[0m\n",
       "  arithmetic       \u001b[1;36m0.85\u001b[0m      \u001b[1;36m0.89\u001b[0m      \u001b[1;36m0.87\u001b[0m        \u001b[1;36m19\u001b[0m\n",
       " measurement       \u001b[1;36m1.00\u001b[0m      \u001b[1;36m1.00\u001b[0m      \u001b[1;36m1.00\u001b[0m        \u001b[1;36m19\u001b[0m\n",
       "     numbers       \u001b[1;36m1.00\u001b[0m      \u001b[1;36m0.82\u001b[0m      \u001b[1;36m0.90\u001b[0m        \u001b[1;36m17\u001b[0m\n",
       " probability       \u001b[1;36m1.00\u001b[0m      \u001b[1;36m1.00\u001b[0m      \u001b[1;36m1.00\u001b[0m        \u001b[1;36m21\u001b[0m\n",
       "\n",
       "    accuracy                           \u001b[1;36m0.95\u001b[0m       \u001b[1;36m100\u001b[0m\n",
       "   macro avg       \u001b[1;36m0.95\u001b[0m      \u001b[1;36m0.94\u001b[0m      \u001b[1;36m0.95\u001b[0m       \u001b[1;36m100\u001b[0m\n",
       "weighted avg       \u001b[1;36m0.95\u001b[0m      \u001b[1;36m0.95\u001b[0m      \u001b[1;36m0.95\u001b[0m       \u001b[1;36m100\u001b[0m\n",
       "\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\">Actual Cost: <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">0.0912</span>\n",
       "</pre>\n"
      ],
      "text/plain": [
       "Actual Cost: \u001b[1;36m0.0912\u001b[0m\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\">┏━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓\n",
       "┃<span style=\"font-weight: bold\"> accuracy </span>┃<span style=\"font-weight: bold\"> support </span>┃<span style=\"font-weight: bold\"> completion_rate </span>┃\n",
       "┡━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩\n",
       "│<span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\"> 0.95     </span>│<span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\"> 100     </span>│<span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\"> 1.0             </span>│\n",
       "└──────────┴─────────┴─────────────────┘\n",
       "</pre>\n"
      ],
      "text/plain": [
       "┏━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓\n",
       "┃\u001b[1m \u001b[0m\u001b[1maccuracy\u001b[0m\u001b[1m \u001b[0m┃\u001b[1m \u001b[0m\u001b[1msupport\u001b[0m\u001b[1m \u001b[0m┃\u001b[1m \u001b[0m\u001b[1mcompletion_rate\u001b[0m\u001b[1m \u001b[0m┃\n",
       "┡━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩\n",
       "│\u001b[1;36m \u001b[0m\u001b[1;36m0.95    \u001b[0m\u001b[1;36m \u001b[0m│\u001b[1;36m \u001b[0m\u001b[1;36m100    \u001b[0m\u001b[1;36m \u001b[0m│\u001b[1;36m \u001b[0m\u001b[1;36m1.0            \u001b[0m\u001b[1;36m \u001b[0m│\n",
       "└──────────┴─────────┴─────────────────┘\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# now, do the actual labeling\n",
    "ds = agent.run(ds, max_items=100)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7a0a4bc7-b62e-4ad3-96a5-9e04fe53c24b",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
